[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) by ayushag-nv · Pull Request #2749 · vllm-project/vllm-omni

ayushag-nv · 2026-04-13T20:08:45Z

Purpose

Add support for FastGen DMD2-distilled 4-step models in vllm-omni. FastGen (NVlabs/FastGen) is NVIDIA's framework for training fast generative models via distillation, including DMD2.

DMD2-distilled models run in a small, fixed number of inference steps with guidance baked in (CFG not needed), substantially faster than the multi-step teacher while reusing the base model's text encoder / VAE / tokenizer.

Shared DMD2 module

DMD2EulerScheduler (diffusion/models/schedulers/scheduling_dmd2_euler.py) — subclass of FlowMatchEulerDiscreteScheduler that always returns the fixed DMD2 training timestep schedule, ignoring caller-passed num_inference_steps / sigmas / mu.
DMD2PipelineMixin (diffusion/models/dmd2/mixin.py) — reads dmd2_denoising_timesteps, dmd2_num_inference_steps, dmd2_guidance_scale, dmd2_scheduler_shift from model_index.json; sanitizes incoming requests to drop CFG / negative-prompt fields that don't apply.
_load_json moved to diffusion/models/utils.py as a shared helper (previously duplicated inline across pipelines).

Model-specific stubs

Per-base-family 5-line classes composing the mixin with the existing base pipeline:

WanT2VDMD2Pipeline(DMD2PipelineMixin, Wan22Pipeline)
WanI2VDMD2Pipeline(DMD2PipelineMixin, Wan22I2VPipeline)
LTX2T2VDMD2Pipeline(DMD2PipelineMixin, LTX2Pipeline)
LTX2I2VDMD2Pipeline(DMD2PipelineMixin, LTX2ImageToVideoPipeline)

All four are registered in diffusion/registry.py (pipeline + post-process) and exported from their respective package __init__.py.

Future DMD2 variants (additional Wan 2.x / LTX-2 / LTX-2.3 checkpoints) only need a new model_index.json — no code change required. New base-pipeline families require one ~5-line stub + registry entries.

Test Plan

uv run pytest tests/diffusion/models/dmd2/ -v

Plus a local end-to-end smoke test: vllm-omni serve <wan-dmd2-ckpt> --omni and a POST /v1/videos curl round-trip.

Test Result

Unit tests parametrized over all four DMD2 pipelines (Wan T2V/I2V, LTX-2 T2V/I2V) — all passing.

Signed-off-by: ayushag <ayushag@nvidia.com>

chatgpt-codex-connector · 2026-04-13T20:08:52Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-13T21:06:12Z

PR #2749 - [WIP] feat: fastgen model integration

OVERALL: WIP (not blocking)
VERDICT: COMMENT

Ready for full review when WIP removed. Preliminary scan:

fastgen model integration for Wan2.2. 7 files, 375+ LOC. pre-commit failing (gate blocker). PR body empty. Fix pre-commit and add description before requesting review.

hsliuustc0106 · 2026-04-14T09:37:19Z

BLOCKER scan:

This PR is marked as [WIP] and has failing pre-commit checks. Please:

Fix the pre-commit issues
Set the PR to draft status or remove the [WIP] tag when ready for review

OVERALL: WIP WITH FAILING CHECKS

VERDICT: REQUEST_CHANGES

lishunyang12

Early review (WIP) -- fastgen DMD2 model integration

Thanks for the PR. The overall approach of subclassing the existing Wan 2.2 T2V / I2V pipelines for DMD2-distilled 4-step models is reasonable. Here are observations for the current state:

1. Heavy code duplication between T2V and I2V DMD2 classes

WanT2VDMD2Pipeline and WanI2VDMD2Pipeline share identical implementations for:

__init__ (scheduler replacement)
_verify_dmd2_request (entire method, ~40 lines)
forward timestep-patching logic (monkey-patch + try/finally)
Class constants (GUIDANCE_SCALE, NUM_INFERENCE_STEPS, DMD2_TIMESTEPS)

Please extract this into a mixin (e.g. DMD2PipelineMixin) that both classes inherit from. This avoids the maintenance burden of keeping two copies in sync and makes it trivial to add future DMD2 variants (e.g. VACE).

2. Monkey-patching `scheduler.set_timesteps` is fragile

In forward(), you replace self.scheduler.set_timesteps with a closure and restore it in a finally block. This approach has problems:

Not thread-safe / async-safe: if two requests are in flight concurrently on the same pipeline instance, they will race on the scheduler's method.
Fragile against parent refactors: if the parent ever calls set_timesteps differently (e.g. via a local reference), the patch silently breaks.

Consider instead:

Overriding the parent's denoising loop method directly, or
Calling self.scheduler.set_timesteps(timesteps=self.DMD2_TIMESTEPS, device=...) once after the parent calls it (by wrapping only the relevant section), or
Using FlowMatchEulerDiscreteScheduler's built-in timesteps parameter at set_timesteps time and overriding num_inference_steps in the super call.

3. `NUM_INFERENCE_STEPS` class constant is declared but never used

You define NUM_INFERENCE_STEPS = 4 on both classes but never reference it. The forward() signature defaults to num_inference_steps: int = 4 as a separate literal. Either use the constant in the default or remove it to avoid confusion.

4. `_verify_dmd2_request` mutates the request in-place silently

The method modifies req.sampling_params and req.prompts in-place. While there are log warnings, consider:

Documenting in the method docstring that the request is mutated (not just verified -- the name _verify_* is misleading for a method that modifies data). A name like _sanitize_dmd2_request would be more accurate.
Returning the modified request or a copy, so callers see that mutation happens. In-place mutation of shared request objects can cause subtle bugs if the same request is reused.

5. `p.get("negative_prompt")` is falsy for empty strings

In _verify_dmd2_request:

if isinstance(p, dict) and p.get("negative_prompt"):

This will not strip negative_prompt if its value is "" (empty string). If the intent is to remove the key entirely when present, use "negative_prompt" in p instead.

6. Test helper uses `object.new` to skip `init` -- brittle

In test_wan_dmd2_request_sanitization.py, _make_pipeline does:

pipeline = object.__new__(cls)
torch.nn.Module.__init__(pipeline)

This creates a pipeline without running any __init__, so _verify_dmd2_request works only because it doesn't touch any instance state set in __init__. If the method ever accesses self.GUIDANCE_SCALE (which it already does), this works only because it's a class attribute. This is fragile -- consider using the mock-based approach from the scheduler test file instead, which is more robust.

7. Missing `import` in scheduler test

In test_wan_dmd2_scheduler.py, line 40:

from vllm_omni.diffusion.request import OmniDiffusionRequest, OmniDiffusionSamplingParams

This import is placed in the middle of the file (after fixtures, before test functions). Move it to the top with the other imports for consistency.

8. Unused import in scheduler test

import inspect on line 2 of test_wan_dmd2_scheduler.py is never used. Remove it.

9. Registry entries look correct

The additions to registry.py (pipeline registry, pre/post-process function mappings) look correct and consistent with existing entries. The DMD2 pipelines appropriately reuse the same pre/post-process functions as their parent pipelines.

10. PR description is empty

Please fill in the Purpose, Test Plan, and Test Result sections. Even for WIP, a brief description of the DMD2 distillation approach and which models this targets (model hub IDs) would help reviewers.

Overall, the core idea is sound. The main actionable items before this is ready for merge are: (1) extract the duplicated DMD2 logic into a mixin, (2) find a less fragile approach than monkey-patching set_timesteps, and (3) minor cleanup items above.

Signed-off-by: ayushag <ayushag@nvidia.com>

chatgpt-codex-connector · 2026-04-16T19:43:04Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

ayushag-nv · 2026-04-16T19:47:59Z

@lishunyang12 Can you review once again ?

linyueqian · 2026-04-16T20:15:25Z

@princepride PTAL

ayushag-nv · 2026-04-16T23:35:21Z

There will be more dmd2 models based of different base models, so they will essentially use DMD2PipelineMixin. Does it make sense to move these to separate folder dmd2 under diffusion/models ? wdyt ?

Signed-off-by: ayushag <ayushag@nvidia.com>

hsliuustc0106 · 2026-04-17T00:04:01Z

There will be more dmd2 models based of different base models, so they will essentially use DMD2PipelineMixin. Does it make sense to move these to separate folder dmd2 under diffusion/models ? wdyt ?

let's wait until more models:) this deserves a new RFC

Signed-off-by: ayushag <ayushag@nvidia.com>

hsliuustc0106 · 2026-04-17T00:06:36Z

can you run the benchmark diffusion and paste the results?

ayushag-nv · 2026-04-17T00:15:33Z

let's wait until more models:) this deserves a new RFC

We have more models in progress, I will write a light weight RFC and update here. Am adding classes for other models as well.

Signed-off-by: ayushag <ayushag@nvidia.com>

ayushag-nv · 2026-04-20T21:16:20Z

@hsliuustc0106 Thanks for merging this. I will follow up on this PR with required documentation and some benchmarks.

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: nainiu258 <cperfect02@163.com>

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com>

ayushag-nv added 3 commits April 13, 2026 12:30

chore: t2v pipeline for wan2.1 dmd2p

76b8571

Signed-off-by: ayushag <ayushag@nvidia.com>

chore: i2v pipeline for wan 2.1 dmd2p

3914181

Signed-off-by: ayushag <ayushag@nvidia.com>

chore: added unit tests

73c88df

Signed-off-by: ayushag <ayushag@nvidia.com>

ayushag-nv requested a review from hsliuustc0106 as a code owner April 13, 2026 20:08

ayushag-nv marked this pull request as draft April 13, 2026 20:08

lishunyang12 reviewed Apr 16, 2026

View reviewed changes

chore: mixin based architecture + fixes

3176727

Signed-off-by: ayushag <ayushag@nvidia.com>

ayushag-nv marked this pull request as ready for review April 16, 2026 19:42

chore: merge upstream/main and resolve conflicts

11bd9a4

ayushag-nv changed the title ~~[WIP] feat: fastgen model integration~~ [Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) Apr 16, 2026

chore: unified extensible structure

a7fa2b3

Signed-off-by: ayushag <ayushag@nvidia.com>

chore: util cleanup

45aea7c

Signed-off-by: ayushag <ayushag@nvidia.com>

hsliuustc0106 reviewed Apr 17, 2026

View reviewed changes

Comment thread tests/diffusion/models/wan2_2/test_wan_dmd2_scheduler.py Outdated

chore: ltx2 add

1a970c8

Signed-off-by: ayushag <ayushag@nvidia.com>

hsliuustc0106 added ready label to trigger buildkite CI diffusion-x2v-test label to trigger buildkite x2video series of diffusion models test in nightly CI labels Apr 18, 2026

hsliuustc0106 approved these changes Apr 20, 2026

View reviewed changes

hsliuustc0106 merged commit dc8a9e2 into vllm-project:main Apr 20, 2026
8 checks passed

ayushag-nv mentioned this pull request Apr 21, 2026

enhancement: extend to dmd2 to image generation + add flux, qwen image pipelines #2974

Merged

nainiu258 pushed a commit to nainiu258/vllm-omni that referenced this pull request Apr 21, 2026

[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) (vl…

086b194

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: nainiu258 <cperfect02@163.com>

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026

[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) (vl…

cd8e5ed

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com>

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) (vl…

03e584c

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) (vl…

2748a45

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com>

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request May 28, 2026

[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) (vl…

6e92076

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com>

quyifei23 pushed a commit to quyifei23/vllm-omni that referenced this pull request Jun 6, 2026

[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) (vl…

4f4b8e0

…lm-project#2749) Signed-off-by: ayushag <ayushag@nvidia.com>

Conversation

ayushag-nv commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Shared DMD2 module

Model-specific stubs

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 14, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Early review (WIP) -- fastgen DMD2 model integration

1. Heavy code duplication between T2V and I2V DMD2 classes

2. Monkey-patching scheduler.set_timesteps is fragile

3. NUM_INFERENCE_STEPS class constant is declared but never used

4. _verify_dmd2_request mutates the request in-place silently

5. p.get("negative_prompt") is falsy for empty strings

6. Test helper uses object.__new__ to skip __init__ -- brittle

7. Missing import in scheduler test

8. Unused import in scheduler test

9. Registry entries look correct

10. PR description is empty

Uh oh!

chatgpt-codex-connector Bot commented Apr 16, 2026

Uh oh!

ayushag-nv commented Apr 16, 2026

Uh oh!

linyueqian commented Apr 16, 2026

Uh oh!

ayushag-nv commented Apr 16, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

Uh oh!

ayushag-nv commented Apr 17, 2026

Uh oh!

Uh oh!

ayushag-nv commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ayushag-nv commented Apr 13, 2026 •

edited

Loading

2. Monkey-patching `scheduler.set_timesteps` is fragile

3. `NUM_INFERENCE_STEPS` class constant is declared but never used

4. `_verify_dmd2_request` mutates the request in-place silently

5. `p.get("negative_prompt")` is falsy for empty strings

6. Test helper uses `object.new` to skip `init` -- brittle

7. Missing `import` in scheduler test

hsliuustc0106 commented Apr 17, 2026 •

edited

Loading